{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import temperature data from the DWD and process it\n",
    "\n",
    "This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.\n",
    "\n",
    "To reduce computing time, we also crop all data earlier than 2007. \n",
    "\n",
    "Files should be executed in the following pipeline:\n",
    "* 1-dwd_konverter_download\n",
    "* 2-dwd_konverter_extract\n",
    "* 3-dwd_konverter_build_df\n",
    "* 4-dwd_konverter_final_processing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.) Extract all .zip-archives\n",
    "In this next step, we extract a single file from all the downloaded .zip files and save them to the 'import' folder. Beware, there is going to be a lot of data (~6 GB of .csv files)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Done'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "import glob\n",
    "import re\n",
    "from zipfile import ZipFile\n",
    "\n",
    "# Folder definitions\n",
    "download_folder = Path.cwd() / 'download'\n",
    "import_folder = Path.cwd() / 'import'\n",
    "\n",
    "# Find all .zip files and generate a list\n",
    "unzip_files = glob.glob('download/stundenwerte_TU_*_hist.zip')\n",
    "\n",
    "# Set the name pattern of the file we need\n",
    "regex_name = re.compile('produkt.*')\n",
    "\n",
    "# Open all files, look for files that match ne regex pattern, extract to 'import'\n",
    "for file in unzip_files:\n",
    "    with ZipFile(file, 'r') as zipObj:\n",
    "        list_of_filenames = zipObj.namelist()\n",
    "        extract_filename = list(filter(regex_name.match, list_of_filenames))[0]\n",
    "        zipObj.extract(extract_filename, import_folder)\n",
    "\n",
    "display('Done')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
